33 research outputs found

    Quantifying the Impact of Non-Stationarity in Reinforcement Learning-Based Traffic Signal Control

    Get PDF
    In reinforcement learning (RL), dealing with non-stationarity is a challenging issue. However, some domains such as traffic optimization are inherently non-stationary. Causes for and effects of this are manifold. In particular, when dealing with traffic signal controls, addressing non-stationarity is key since traffic conditions change over time and as a function of traffic control decisions taken in other parts of a network. In this paper we analyze the effects that different sources of non-stationarity have in a network of traffic signals, in which each signal is modeled as a learning agent. More precisely, we study both the effects of changing the \textit{context} in which an agent learns (e.g., a change in flow rates experienced by it), as well as the effects of reducing agent observability of the true environment state. Partial observability may cause distinct states (in which distinct actions are optimal) to be seen as the same by the traffic signal agents. This, in turn, may lead to sub-optimal performance. We show that the lack of suitable sensors to provide a representative observation of the real state seems to affect the performance more drastically than the changes to the underlying traffic patterns.Comment: 13 page

    Sample-Efficient Multi-Objective Learning via Generalized Policy Improvement Prioritization

    Full text link
    Multi-objective reinforcement learning (MORL) algorithms tackle sequential decision problems where agents may have different preferences over (possibly conflicting) reward functions. Such algorithms often learn a set of policies (each optimized for a particular agent preference) that can later be used to solve problems with novel preferences. We introduce a novel algorithm that uses Generalized Policy Improvement (GPI) to define principled, formally-derived prioritization schemes that improve sample-efficient learning. They implement active-learning strategies by which the agent can (i) identify the most promising preferences/objectives to train on at each moment, to more rapidly solve a given MORL problem; and (ii) identify which previous experiences are most relevant when learning a policy for a particular agent preference, via a novel Dyna-style MORL method. We prove our algorithm is guaranteed to always converge to an optimal solution in a finite number of steps, or an Ï”\epsilon-optimal solution (for a bounded Ï”\epsilon) if the agent is limited and can only identify possibly sub-optimal policies. We also prove that our method monotonically improves the quality of its partial solutions while learning. Finally, we introduce a bound that characterizes the maximum utility loss (with respect to the optimal solution) incurred by the partial solutions computed by our method throughout learning. We empirically show that our method outperforms state-of-the-art MORL algorithms in challenging multi-objective tasks, both with discrete and continuous state and action spaces.Comment: Accepted to AAMAS 202

    Analysis of single nucleotide polymorphisms in the FAS and CTLA-4 genes of peripheral T-cell lymphomas

    Get PDF
    Angioimmunoblastic T-cell lymphoma (AILT) represents a subset of T-cell lymphomas but resembles an autoimmune disease in many of its clinical aspects. Despite the phenotype of effector T-cells and high expression of FAS and CTLA-4 receptor molecules, tumor cells fail to undergo apoptosis. We investigated single nucleotide polymorphisms (SNPs) of the FAS and CTLA-4 genes in 94 peripheral T-cell lymphomas. Although allelic frequencies of some FAS SNPs were enriched in AILT cases, none of these occurred at a different frequency compared to healthy individuals. Therefore, SNPs in these genes are not associated with the apoptotic defect and autoimmune phenomena in AILT

    Parameterized Melody Generation with Autoencoders and Temporally-Consistent Noise

    No full text
    We introduce a machine learning technique to autonomously generate novel melodies that are variations of an arbitrary base melody. These are produced by a neural network that ensures that (with high probability) the melodic and rhythmic structure of the new melody is consistent with a given set of sample songs. We train a Variational Autoencoder network to identify a low-dimensional set of variables that allows for the compression and representation of sample songs. By perturbing these variables with Perlin Noise— a temporally-consistent parameterized noise function—it is possible to generate smoothly-changing novel melodies. We show that (1) by regulating the amount of noise, one can specify how much of the base song will be preserved; and (2) there is a direct correlation between the noise signal and the differences between the statistical properties of novel melodies and the original one. Users can interpret the controllable noise as a type of “creativity knob”: the higher it is, the more leeway the network has to generate significantly different melodies. We present a physical prototype that allows musicians to use a keyboard to provide base melodies and to adjust the network’s “creativity knobs” to regulate in real-time the process that proposes new melody ideas

    PĂłster: Rosario "Ciudad Candia"

    No full text
    El objetivo general es visibilizar la producción de la empresa, que permitirå realizar un recorrido transversal en el desarrollo arquitectónico local, hilvanando períodos históricos, proyectistas y técnicas constructivas. Es notable como en la historiografía de la arquitectura prevalece la cita del proyectista, relegando a un segundo plano los hacedores que contribuyeron con su saber empírico y fåctico a la construcción de la ciudad.Fil: Secretaria de Ciencia y Tecnología - Universidad Nacional de Rosario. Facultad de Arquitectura, Planeamiento y Diseño; Argentina

    CARMA1 is a critical lipid raft-associated regulator of TCR-induced NF-kappa B activation.

    No full text
    CARMA1 is a lymphocyte-specific member of the membrane-associated guanylate kinase (MAGUK) family of scaffolding proteins, which coordinate signaling pathways emanating from the plasma membrane. CARMA1 interacts with Bcl10 via its caspase-recruitment domain (CARD). Here we investigated the role of CARMA1 in T cell activation and found that T cell receptor (TCR) stimulation induced a physical association of CARMA1 with the TCR and Bcl10. We found that CARMA1 was constitutively associated with lipid rafts, whereas cytoplasmic Bcl10 translocated into lipid rafts upon TCR engagement. A CARMA1 mutant, defective for Bcl10 binding, had a dominant-negative (DN) effect on TCR-induced NF-kappa B activation and IL-2 production and on the c-Jun NH(2)-terminal kinase (Jnk) pathway when the TCR was coengaged with CD28. Together, our data show that CARMA1 is a critical lipid raft-associated regulator of TCR-induced NF-kappa B activation and CD28 costimulation-dependent Jnk activation
    corecore